Annotation by Category: ELAN and ISO DCR
نویسندگان
چکیده
The Data Category Registry is one of the ISO initiatives towards the establishment of standards for Language Resource management, creation and coding. Successful application of the DCR depends on the availability of tools that can interact with it. This paper describes the first steps that have been taken to provide users of the multimedia annotation tool ELAN, with the means to create references from tiers and annotations to data categories defined in the ISO Data Category Registry. It first gives a brief description of the capabilities of ELAN and the structure of the documents it creates. After a concise overview of the goals and current state of the ISO DCR infrastructure, a description is given of how the preliminary connectivity with the DCR is implemented in ELAN.
منابع مشابه
A Registry of Standard Data Categories for Linguistic Annotation
In this paper we describe the most recent work within ISO TC37/SC 4, and in particular the development of a Data Category Registry (DCR) component of the Linguistic Annotation Framework. The DCR will contain a formally defined set of linguistic categories in common use within the language engineering community for reference and use in linguistically annotated resources. We outline the first pro...
متن کاملA Global Data Category Registry for Interoperable Language Resources
ISO TC 37 is creating a Data Category Registry (DCR) as an online open-source RDF-based resource for use by implementers of electronic language resources, including terminologies, presentational and non-presentational lexical resources, NLP lexica, etc. The DCR will allow dynamic generation of data category selections (DCSs), e.g., subsets of the collection reflecting various thematic domains a...
متن کاملTowards standardized descriptions of linguistic features: ISOcat and procedures for using common data categories
Since 2009 the Max Planck Institute for Psycholinguistics in Nijmegen offers a web-based open source reference implementation of the ISO DCR (Data Category Registry, ISO 12620:2009), which is called ISOcat (“Data Category Registry for ISO TC 37”). ISOcat describes the data model and procedures for DCR. The talk presents the currently stage of the development and the status of ISOcat, and demons...
متن کاملAnnotating Multi-media/Multi-modal Resources with ELAN
This paper shows the actual state of development of the manual annotation tool ELAN. It presents usage requirements from three different groups of users and how one annotation model and a number of generic design principles guided the choices made during the development process of ELAN. Introduction At the Max-Planck-Institute for Psycholinguistics (MPI) software development on annotation tools...
متن کاملAn API for accessing the Data Category Registry
Central Ontologies are increasingly important to manage interoperability between different types of language resources. This was the reason for ISO to set up a new committee ISO TC37/SC4 taking care of language resource management issues. Central to the work of this committee is the definition of a framework for a central registry of data categories that are important in the domain of language ...
متن کامل